Shallow Information Extraction from Medical Forum Data

نویسندگان

  • Parikshit Sondhi
  • Manish Gupta
  • ChengXiang Zhai
  • Julia Hockenmaier
چکیده

We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical forum documents, our goal is to extract two related types of sentences that describe a biomedical case (i.e., medical problem descriptions and medical treatment descriptions). Such an extraction task directly generates medical case descriptions that can be useful in many applications. We solve the problem using two popular machine learning methods Support Vector Machines (SVM) and Conditional Random Fields (CRF). We propose novel features to improve the accuracy of extraction. Experiment results show that we can obtain an accuracy of up to 75%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Patient-Centered Information Extraction for Effective Search on Healthcare Forum

Online healthcare forums are one of the major social media in Health 2.0 for patients and caregivers to share personal experience and to help each other. However, current forums do not support effective information search and thus users are unable to fully leverage the rich information in the forums. In this work, we propose patient-centered information extraction to better organize the informa...

متن کامل

Making Shallow Look Deeper: Anaphora and Comparisons in Medical Information Extraction

The paper focuses on resolving natural language issues which have been affecting performance of our system processing Polish medical data. In particular, we address phenomena such as ellipsis, anaphora, comparisons, coordination and negation occurring in mammogram reports. We propose practical data-driven solutions which allow us to improve the system’s performance.

متن کامل

IIT TREC 2007 Genomics Track: Using Concept-Based Semantics in Context for Genomics Literature Passage Retrieval

For the TREC-2007 Genomics Track [1], we explore unsupervised techniques for extracting semantic information about biomedical concepts with a retrieval model for using these semantics in context to improve passage retrieval precision. Dependency grammar analysis is evaluated for boosting the rank of passages where complementary subject/object concept pairs can be identified between queries and ...

متن کامل

Linguistic Processing of Texts Using Geppetto

We describe the linguistic analyzer of a prototype for Information Extraction from texts. Such analyzer uses information derived from a shallow processor to limit the computational cost of the analysis. At the same time, shallow techniques are used to collapse parse fragments when a complete parse is not possible. The linguistic analyzer has been built using GePpeTto, an environment that allows...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010